positive 0
Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish
Cohen, Kevin, Manrique-Gómez, Laura, Manrique, Rubén
This study explores the use of large language models (LLMs) to enhance datasets and improve irony detection in 19th-century Latin American newspapers. Two strategies were employed to evaluate the efficacy of BERT and GPT-4o models in capturing the subtle nuances nature of irony, through both multi-class and binary classification tasks. First, we implemented dataset enhancements focused on enriching emotional and contextual cues; however, these showed limited impact on historical language analysis. The second strategy, a semi-automated annotation process, effectively addressed class imbalance and augmented the dataset with high-quality annotations. Despite the challenges posed by the complexity of irony, this work contributes to the advancement of sentiment analysis through two key contributions: introducing a new historical Spanish dataset tagged for sentiment analysis and irony detection, and proposing a semi-automated annotation methodology where human expertise is crucial for refining LLMs results, enriched by incorporating historical and cultural contexts as core features.
- North America > Mexico (0.04)
- South America > Colombia > Bogotá D.C. > Bogotá (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (4 more...)
Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models
Lorge, Isabelle, Joyce, Dan W., Taylor, Niall, Nevado-Holgado, Alejo, Cipriani, Andrea, Kormilitzin, Andrey
Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate published prognostic factors that capture the clinical syndrome of DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The resulting model is then able to extract and label spans related to a variety of relevant positive and negative factors in real clinical data (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality by training the model exclusively on synthetic data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and human-expert annotation would normally be required.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)